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✓101 



An integrated circuit contains (102) a micropro- 
cessor core (100), program memory (104) and separate 
data storage (106. 108). together with analogue and dig- 
ital signal processing circuitry (110). The ALU (302) is 
16 bits wide, but a 32-bit shift unit (312) is provided, 
using a pair of 16-bit registers. The processor has a 
fixed length instruction format, with an instruction set 
including multiply and divide operations which use the 
shift unit over several cycles. No interrupts are pro- 
vided. External pins of the integrated circuit allow for 
single stepping and other debug operations, and a serial 
interface (SIF) which allows external communication of 
test data or working data as necessary. The serial in- 
terface has four wires (SERIN, SEROUT. SER-CLK. 
SERLOADB). allowing hamkhaking with a master ap- 
paratus, and allowing direct access to (he memory space 
(104-110) of the processor core (100), without specific 
program control. Within each processor cycle, the pro- 
cessor circuitry is divided into plural stages, and latches 
are interposed between the stages to minimise power 
consumption. 
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DATA PROCESSING CIRCUITS AND INTERFACES 

The present invention relates primarily to single-chip 
data processing devices/ but also to microprocessors and 
5 to digital circuits generally , and to interface circuits. 

In the present day, many products incorporate 
microprocessor based data processing circuits, for 
example to process signals, to control internal operation 

10 and/or to provide communications with users and external 
devices. To provide compact and economical solutions, 
particularly in mass-market portable products, it is 
known to include microprocessor functionality together 
with program and data storage and other specialised 

15 circuitry, in a custom "chip" also known as an 
application-specific integrated circuit (ASIC). Field 
Programmable Gate Arrays (FPGA) such as those made by 
Xilinx™, Actel™ and Altera™ may also be used to implement 
such solutions . 

20 

However, for various reasons, the integrated 
microprocessor functionality conventionally available to 
an ASIC designer tends to be the same as that which would 
be provided by a microprocessor designed for use as a 
25 separate chip. The present inventors have recognised 
that this results in inefficient use of space and power 
in the ASIC solution, and in fact renders many potential 
applications of ASIC technology impractical and/or 
uneconomic. 

30 

Various aspects of the invention are defined in the 
appended claims, while the applicant reserves the right 
to claim any further aspects of the invention that may 
be disclosed herein. 
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3 

provided automatically, but only at times known in 
advance to the programmer. Examples of such stimuli 
include requests for communication from external devices, 
• and entry of "sleeping" state for power conservation.. 
5 In the present embodiments, special instructions are 
defined whereby the programmer can define fixed periods 
in which external communication may take place, and fixed 
points for entry into the sleeping state. 

10 The various aspects of the invention will become apparent 
from the following description of specific embodiments. 
These are presented by way of example only; with 
reference to the accompanying drawings, in which: 

v »«. — 

15 Fig. 1 shows the basic arrangement of an integrated 
circuit including a processor embodying the invention; 

Fig. 2 shows the programmer's model and instruction 
format of the processor of the Fig. 1 circuit; 

20 

Fig. 3 shows the data architecture of the processor; 

Figs. 4A and 4B illustrate the execution of multiply and 
divide operations in the processor of Fig. 3; 

25 

Figs. 5A to 5G show waveforms illustrating various 
functional features of the circuit of Fig. 1; 

Fig. 6 is a schematic diagram of a serial interface (SIF) 
30 similar to that of the circuit of Fig. 1; 

Fig. 7 shows in more detail a shift register of the 
serial interface of Fig. 6; 



35 Fig. 8 is a flowchart showing operation of a master 
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Single chip solutions are ideal" for many products, and 
can comprise mixed mode CMOS circuitry such that analogue 
and digital signal processing can be performed on-chip. 
For some applications, however, there is a further need 
5 for more logic processing/ for functions such as user 
interfaces, controlling E 2 PROMs, networks and protocol 
conversions. Conventionally this requires a traditional 
microprocessor or microcontroller, which increases 
product size for two -reasons . Firstly, the circuit board 

10 of a product has two chips instead of one, and secondly, 
the stand-by modes of conventional low-power processors 
still consume substantial power, so that the product 
requires a larger battery. The larger battery and bigger 
circuit board make the product quite expensive, while the 

15 need for communication between chips makes the system 
more sensitive to electromagnetic interference. - There' 
are many other disadvantages of a multiple chip solution. 

There is clearly a desire for single chip solutions in 
20 which the processor, program storage and RAM are embedded 
in a single ASIC, to facilitate products such as: 
portable wireless products, instrumentation, utilities 
metering systems, low data rate radio systems, medical 
diagnostics, safety critical and verifiable systems, 
25 pagers, and certain sections of mobile telephones. 
Certain conventional designs can be reduced to a single 
ASIC by embedding a processor or a memory corresponding 
to the conventional external processor, if the ASIC can 
be made large enough. However, this goes only a small 
30 way to addressing the problems identified above. 

The processor which is the subject of the present 
disclosure is a custom microprocessor which has been 
developed for use in ASIC designs. The requirements and 
35 trade-offs when designing for an ASIC are quite different 
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7 

access to the address space and processor registers. It 
is used to provide efficient IC manufacturing test 
(allowing testing : of 1 the processor, ROM, RAM and memory 
mapped peripherals), prototype device proving,, and 
5 production board level test. It can also be used for 
ASIC to microprocessor communication in systems that use 
a tied external microprocessor, eliminating the need for 
separate interfaces and on-board circuitry for 
communications and testability functions. 

10 

The processor uses separate program and data spaces (a 
Harvard architecture), and the program instruction word 
can thus be wider or narrower than the data word. This 
is an example of where the on-chip nature of the design 

15 favours a different solution to the traditional 
microprocessor (von Neumann architecture) because ,the 
need for separate address busses on-chip does not cause 
any increase in the number of connections (pins) off- 
chip. On both address busses, the timing allows either 

20 synchronous or asynchronous devices. This is important 
as synchronous devices are often smaller and lower power 
than their asynchronous counterparts . • 

For verification and also to reduce gate count, the 
25 processor does not support interrupts. Instead, 
efficient polling instructions and the use of 
sleep/wake-up allow efficient multi-event responses. 
The processor has a RISC instruction set. Instructions 
are single word and mostly execute in a single cycle. 
30 Particular operations taking several cycles to execute 
are the Multiply (16 cycles), Divide (16 cycles) and 
Shift (variable) operations, which use the special shift 
unit mentioned above. There are four addressing modes: 
Immediate, Direct and Indexed by registers X and Y. 



35 
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within the ASIC itself, according to the particular 
application. Where high speed signal processing 
functions are to be performed by the ASIC, it will 
typically be desirable to have these performed by 
5 dedicated circuitry rather than by the programmed 
processor core described above, and to have less 
computationally intensive, but more logically complex 
parts- of the required functionality implemented by the 
processor under program control . Therefore, for example, 

10 if a number of external analogue signals are to be sensed 
with a high bandwidth, and combined in accordance with 
predetermined repetitive algorithms to obtain a 
meaningful measurement, the A-D conversion and these 
high-speed processing functions can be performed by 

15 dedicated circuitry, and the measured value supplied 
periodically to the processor core via the memory mapped 
input/output circuitry. This principle is applied in the 
gas flow meter disclosed below as an example application, 
with reference to Fig. 11. 

20 

PROGRAMMER'S MODEL 

Fig. 2 shows the logical arrangement of internal 
registers and memory space of the processor of Fig. 1, 

25 commonly known as the programmer's model. Registers AH, 
AL, X and Y are provided for transient storage of data 
and address values within the processor core, together 
with a register for the program counter PC, and for 
various flags (C, S, N, Z) generated by the arithmetic 

30 and logic circuits of the processor. 

The program storage space (memory 104) comprises 64K 
(65536 decimal) locations, each storing an 18 bit 
instruction word. The instruction words are of fixed 
35 format, and the format is illustrated at the foot of Fig. 
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mentioned above in relation to Fig. 2, the bits 
INSTRN[3:2] of the instruction word are used to identify 
registers in instructions having the operand "reg" only. 
Also: the '@' indicates "contents of"; 'addrl6' is the 
5 10 bit address value INSTRN[17:8] sign-extended to 16 
bits; '12' is simply an example data value; while 'PC 
indicates the address of the current instruction. 

DATA ARCHITECTURE 

10 

Fig. 3 shows in block form the data architecture of the 
processor core. The operation of all elements in a given 
cycle is controlled in accordance with one program 
instruction by a control and decode section 300. The 

15 principal elements of the data architecture are the 
arithmetic and logic unit (ALU) 302; the A register 304 
(32 bits comprising AH and AL) ; X register 306 and Y 
register 308 (16 bits each); program counter (PC) 
register 310 (11 bits); shift/load logic 312; a PC 

20 incrementer 314 , comprising a 16 bit half adder; an 
address sign extender 316; an address adder 318; various 
multiplexers 320 to 332; a flags and condition logic 334; 
and a tri-state buffer 335. 

25 Communication with the program memory is via the program 
address bus PR0G_ADDR and the instruction bus INSTRN. 
Communication with the data memory (including ROM, RAM 
and memory mapped I/O circuits) is via the data address 
output ADDR and the bi-directional data bus DATA. Also 

30 provided is the serial interface register having an 
address part 336 (bits SIF_ADDR [15:0]), an address space 
determining part 338, a read/write control part 340, and 
a data part 342 of eighteen bits. Also associated with 
the serial interface is a tri-state buffer 346. 



35 
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register; the other from memory or as an immediate, and 
returning the result to the same register. The ALU A 
input receives the register input and the ALU B input the 
memory/immediate value. The ALU performs • the required 
5 logical or arithmetic operation and presents the result 
on the E output from where it can be written back into 
the same register. For the AL/ AH ~ registers 304 , the 
additional shift/load block 312 is inserted between E^db 
the registers. This allows shift operations to be 
10 performed on the combined register pair and is also an 
integral part of the scheme by which multiply and divide 
are performed in this embodiment. 

No Operand Instructions; NOP, BRK , PRINT , SIF 

15 

NOP, BRK, PRINT and SLEEP do not involve the data 
architecture at all. SIF executes as a SIF cycle if a 
SIF request is pending (described more fully below), 
otherwise it behaves as a no-op. 

20 

Data Address Modes 

Memory/ immediate values are generated and applied to the 
ALU B input, using the multiplexers 320, 322, 324, 328 

25 and neighbouring components. There are four data address 
modes: Immediate, Direct, Indexed X and Indexed Y. 
Immediate takes a value directly from the instruction 
(I[15:0]). The other modes use. the instruction value to 
select a value from memory. The top 10 bits of the 

30 latched instruction are sign extended at 316 to 16 bits. 
To this is added (318) either zero (Immediate or Direct), 
X (Indexed X) or Y (Indexed Y) as selected by multiplexer 
324. The output of adder 318 is fed to the ADDR bus via 
multiplexer 326. For immediate values multiplexer 328 

35 counts the output of adder 318 directly to the ALU B 
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by the processor as a no-op. 

nnal Operand Instruction s t ADD. ADDC , SUB, SUBC, NADD, 
CMP. OR, AND. XOR 

These instructions operate between a register and a 
memory/immediate value, returning the result to the same 
register. The register value is presented on ALU input 
A via multiplexer 320. The memory /immediate value is 
10 presented on ALU B input as above. 

The result E from the ALU is fed to the appropriate 
register. For these instructions multiplexer 330 and the 
shift/load unit 312 are both set so that E is propagated 
15 unchanged to X, AH and AL as well as Y. The appropriate 
register only is clocked. 

Arithmetic instructions set all four FLAG bits out of the 
ALU. Logical instructions set only the N and Z bits. 



20 



Branch Instructions; BRA. BLT , BPL. BNE, BEQ , BCC , BCS 



Normally multiplexer 332 is set to 0 so that at the end 
of each instruction, PC is clocked and the program 
25 counter increments. PROG_ADDR, the address in program 
space, is equal } to the PC value. 

When a branch instruction is executed, if BRANCHJTRUE is 
high (output by the condition logic 334) , multiplexer 332 
30 is switched to 0 so as to load a new PC value from ALU 
output. Otherwise, the PC increments, as normal. 
BRANCH_TRUE checks the branch condition by reference to 
the appropriate flag bits, using a multiplexer which is 
hard wired to the appropriate instruction code bits. 
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register 'A' and are performed entirely locally to the 
shift/load logic 312 - the ALU is not involved. This 
allows the shift operations to be performed over the full 
32-bit A register , while the ALU retains is compact/ 16- 
5 bit size. Shifts are performed by successive one bit 
shifting in this embodiment requiring simply 32 three-way 
multiplexers, to give left shift/ no shift or right shift 
per bit. Six different operations are possible, the 
appropriate one being selected by the control unit 300 
10 in accordance with the current instruction: (1,2) load 
AH or AL directly with no shift; (3) shift AH and AL 
left; (4) load AH while shifting left; (5) shift AH and 
AL right; and (6) load AH while shifting right. 

15 The number of bit positions to shift is specified as a 
memory/immediate value to the instruction. This is read 
off from the ALU B path by the control architecture and 
used to generate the appropriate number of cycles. Each 
cycle then shifts by one bit left or right. 

20 

In the conditioning logic 334 , the SHIFT_IN value is 
selected to allow the carry flag C to be shifted in for 
SCL/SCR shifts, zero to be shifted in for SAL, and the 
current sign bit (AH bit 15) to be shifted in (extended) 
25 for SAR.SHIFTJ3UT, the bit shifted out of A is loaded 
into the carry flag C with each bit shifted. 

Multiply Instruction: MULT 

30 Multiply is performed by repeated shift and add and takes 
16 cycles. The algorithm uses a combination of the ALU 
302 and the shift/load logic 312, shown schematically in 
Fig. -4A for one cycle. The shift/load block 312 is 
multiplexer controlled by the instruction decoder and, 

35 during MULT, by the current lowest bit A[0] in the A 
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Divide Instruct ion; DIV 

Divide is performed by repeated subtraction and takes 16 
cycles. Again, the algorithm uses a combination of -the 
ALU 302 and the shift/load logic 312, as shown 
schematically in Fig. 4B. The dividend is the initial 
value of AH, AL, the divisor is a memory/ immediate value 
presented on ALU input B and the result and remainder are 
generated in AH , AL . The pseudo-code for DIV is: 



Repeat 16 times 

ALU_RESULT:= A[31:15] - ALU_B 

If ALU_CN == 1 then 

A[31:0]:= {A[30:0],0} —shift left, -0 

15 else 

A[31:15]:= ALU_R ES ULT — subtract ALU_B 

from top 16 bits 
A[31:15]:= {A[30:0],l} —shift left, -1 

endif 
20 end 

The ALU calculates A[31:15] minus ALU_B on each cycle, 
and according to the value of ALU_CN, the control 
architecture sets the shift /load logic 312 either to 
25 shift left (when ALU_CN = 0) or to load AH and shift left 
(when ALU_CN = 1) . 

SIF cycles 

30 There are four cases of SIF cycles: memory read, memory 
write, register read and register write. These will be 
described more generally later, with reference to the 
generalised embodiment if Fig. 7. Briefly, and with 
reference to Fig. 3, SIF memory read cycles set 

35 multiplexer 326 to the SIF address and load the result 
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ue snq YtLYd aqq asn ujp6e sajo/io a^xjAv iaasx6aj £is 

•s^xq uaa^xxs q.sa/*ox OZ 
aqq. itiapo oq. pasn aq Atuo ueo auox*? snq ViLYQ am 'apx** 
sqxq uaaq.q6xa sx paoM uox^ona-qsux qoea aoujs -laAaMoq 
'uoT^Dni^suT ^uaiino aqq. uioij 6uxpeaa uaq*\ ^saia^ux 
jo Axuo aap sq.xq asaqj, -uoT^wado pcaa jis aoj 
papcoT aq Htm puc aa^sx5aj YiYO^IS aqq. ca Axq.oa;rrp paj ST 
aie uot^dtu^sut ;uauno aqq. jo sq.xq z aqq. q^qi Q^on 

•ppaa Aaouiaui p aiaw -qx jx £fr£ jcaqsxBaa YtLYa~JTS aqa oq. 
aaaqq. iuojj pup see ^ajjnq 3^^s-tj^ uxa snq YJ.YO aqq. o^uo 
paxqcua sx anjPA sxqj, -uoxq.onaq.sux ^uauno aqq. jo s6exJ 
'Dd 'A 'X 'TY 'HY aq upo sxqj, • ZZZ P^* QZZ sjraxaxdxvcnui 01 
Suxsn paq.oaxes sx ppaa aq oq. aa^sxfiaa aqj; • aq.pxpamaa:*ux 
up se snq YJ/YCI aqi asn sajoAo pcaa jaqstBai 31s 

# 9^e .xajjnq aqeqs-TJ^ 3J,IHM~~aiS aqq. ^T A snq YiLYd aq:q oq. 
Ztz JeqstSaj YiYa~JIS aqq. uioaj Ax^oaaxp a^TJA pup ss^ipp© s 
JIS aqq. oq. SZZ aaxaxdx^x™ ^as saxoAo b^tjj\ Aaouiaui jis 

'ZVZ JB^sxBaa YJ*Ya 

&IS aqq oiux snq YiLYQ aqq. uioaj x^TT^ ed U T P u * Ax^oaaxp 

ZZ 
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the following description of pin functions, the pin name 
and a pin type input , output or bi-directional (tri- 
state) is presented. The functions and operations of the 
pin are then described.' 

5 

Figs. 5A to 5G of the drawings are presented to 
illustrate the waveforms present on the pins , as 
described below. 

10 RST ( input ) - Asynchronous reset. Resets processor to 
known state. 



Registers and flags 


0 




Sleep state 


Awake 




Run/Stop state 


Run 




SIF cycle 


None pending 


-r- 



On releasing RST the processor will start executing, code 
from address 0. The internal reset signal is held active 
20 until a falling clock edge after RST is released to 
ensure a clean restart. 

PCLKf input ) - Processor clock. As shown in Fig. 5A, each 
processor cycle requires 4 PCLK clocks . Both edges of 

25 the clock are used. Cycles always start on a rising 
clock edge and comprise this and the following 7 clock 
edges. The edges are referred to as clock 0 through 
clock 7 and the parts of the cycle are numbered according 
to the clock they follow. PCLK .may be stopped and 

30 restarted at will to switch the processor on and off. 
Note that output signals will freeze in whatever state 
they were in and that the processor will not be sensitive 
to any input signals except RST whilst the clock is 
stopped. 

35 



mot adOiS 6uxpTOU 33ao saxeq. Axo^BTpauiuiT josssooid aqj, SC 
•afipa )TIDd Butstj e Buxanp Axieuaaqxa mot 8dOJ.S Bujund 
Aq paddoqs Axienueui aq ueo josssoojd aqj, • ^UTodxcajq b 
seq jtossaooad aqq ^Bqi pt^om apxs^no aqq. 01 sa^BOTPUT 
sxm, -sdo^s pup mot SdOiS saATjp +i 'uoT^orut^suT . 
XHa eqtl semoexa jossaobad aqq uaqM 'OS -BT3 "T UMoqs sy 0£ 

•qfiTq Att bu1iou 

st pup dt\TT"d leuja^ui ub q^t* UTBjp-uedo st QdOiS 
•axBMe jo daaTSP st ^T aaq^aqrt aAoqe aq.p^s xaAaT doq. 
b st sjqj, -paddo^s ao Butuuiuc jaq^Ta st jrossaooad aqj, S3 

•anoTAeqaq T aAS T 

UTd aq^ saqi^^sap smotto? ^^qq. uoT^dTjosap aqj. -UMoqs sb 

sped bta aaiSlwiH puB QdOlS sujd apTS^no o% qno qqfinoaq 

aq qsnui st^uBts asaqj, -uot^botT^ 13 ^ esn 02 

qou st pus 6uT66nqap ioj Atuo papuaquy st sxqj, -A^TIT^? 

6nqap daqs 3t6uts puc wod3(B9aq 'doqs/UB^s jossaooid 

- UnduT) d3J,S mu J U nduT) NI HdOJ,S M^no; j,nO adOJ,S 

*dON w 951TI ^snC saApqaq SI 
daaqs 'T pTaqsT dn~3XYM JT l^qi. s^om '0 XOOTO 

st paqoeqap st i aqq aaaqM aBpa Sutstj xiDd ®M3 'aToAo 
aqq jo suua^ ui • paqoaqap st T '« u^U* d33^ ButmottoJ 
uoT^otuisuT aqq qi™ s^jb^ssj pus dn sa^B« jossaooad 
aqq. pue mDd AaaAa jo a6pa Butstj am uo paTdures uaqq. st 0T 
dn~3HYM * mot Ax a a st uoT^dumsuoo ia«od aq^ pue paspiiTUTUi 
st A^TAT^^^ aaaqM apoui daaTS b oquT sao6 it uoTqonaqsUT 
daarLS sq^ saqnoaxa aossaoojtd aqq uaqM 'as -Bt3 ut UMoqs 
sv -daaxs uiojj jossaoojd aqq saxBM - Unaui) dn 3XYM 

S 

•aToAo 3t6uts c ut 

aqnoaxa suoTqonjqsuT Jossaooid ^sow • auiTq TT B ^asaJ 
uioaj paxTJ ^ou ajojaiaqq st saxoAo jo assqd aqj, "9TPT 
st lossaaoid aqq UaqM paxqesTP st aaouanbas xooto aqj, 
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25 

and will stop at the end of- the current instruction- 

RUN_STEP is an input with a built-in pullup. As shown 
in Fig. 5D, when high/ it-forces the .processor into the 
run state. RUN_STEP over-rides STOPB, so that if it is 
held high, the processor will run continually ignoring 
breakpoints and stop requests.. In normal use, the pin 
should be allowed to pull itself into this condition. 
For debug, RUN_STEP should be normally low. Breakpoints 
are then enabled. Note that this means that strategic 
breakpoints can be left in the final code and enabled by 
control of RUN_STEP. This is a powerful tool for test 
and verification purposes. RUN_STEP is then taken high 
for 6ne clock to restart the processor. 

As shown in Fig. 5E, single stepping requires control of 
both STOPB & RUN_STEP (illustrated for single cycle 
instruction) : 



20 



25 



30 



Take. RUN_STEP high 
Wait until STOPB rises 

Take RUN_STEP low and drive STOPB low. 
wait > 1 clock 
Release STOPB 
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PROP, ADDR fo ^ r ,it.K PRO ^ ^T.K foutontM, TNSTRN (input) - 
Program space memory interface. Unclocked or clocked 
ROMs can be used. PC is the instruction address (16 
bits). As shown in the Fig. 5F waveforms PROG_CLK is the 
clock for clocked ROM (active rising edge ) and INSTRN is 
the instruction data (18 bits). 

For multi-cycle instructions, PROG_CLK rises on the first 
cycle, stays high from intermediate cycles and falls on 



se 

qoxqM aajq HOX up ut papnxoux jo uxd oxjxoads p sc qno 
q.q6noaq jeq^ia sx qi # qsaq oi joj uoi^wado aosseooad 
50 A^xxxqxsTA aAj6 oq pasn aq upo qoxqM snq qndqno 
flTY 3^. jo HOX uv -qndqno qsaq 31 - ^ qndqno ) j,nO J,S3iL OE 

•aaixoa^uoooaaxui asodand x^aauaB * P^c 
DISY eqq uaa**qaq uoxqcoxununuoo eou^sui zroj Buxmoxx^ 'JIS 
aqq uiojj uaqqxa/A aq Apui saoxAap :iaqqo 'aBcuicp x^ us PT° D ^ 
q^suxpBc suoxqcoxxddc auios 6uxqoaqo:id joj Aapssaoau 55 
st sxm '.lis eqq uiojj aoxAap papooap aqq oq saqxoM 
jrasn squaAaad apooap ssaappc ue ux x^uBxs sxqq Buxpnxoui 
•(paddoqs aossaooad aqq qqx** Bnqap ux oq pasoddo se) 
uoT^wado x 13 ^ 011 U T dIS ^qq uioaj paqsanbaj sx A;iouiaui oq 
b^tja v uaqM /*ox saoB qi 'qByq Axx^ 3011 S T H3iLIHM~~wn 03 

qscx Qq^ oq. axoAo qs:ixj aqq uioaj PTT* A XT™ 
HQCIY *axoAo qscx ^q^ uo MO T °6 pui? saxoAo aqexpauuaqux 
jo; qBxq Acqs 'axoAo ^sjxj aqq uo qBxq 06 XT™ )TID~YJiYa ST 
* suoxqonaqsux axoAo-xqxnui ***ox Acqs XT™ XTQ - YtLYa 

'aXoAo Aiouiaui c aaxnbaa qou saop uoxqonjrqsux uv ji 

*OS ®q^ J° SUUOJ9ABM 

aqq aas • ( sqxq gx) snq cqpp xcuoT^o^TPiq ®q^ sx oi 
YJiYCI pup Aaouiaui pa^ooxo JOj ^fooxo c sc s^op osj© qojq/* 
aqoj^s axoAo v sx MaD~Y.LYa 'saqxJM dis uioaj aopds Aaouiaux 
aqq jo sqaud jo uoxqoaqoad smoxx* 93«LIHM~~WI r I 'auxx ^oax^s 
aq.TXM/peaa aqq sx HM~H -(sqxq gx) ssajppe aqq sx HdOY 
•pasn aq ueo Aiouxaui pa^fooxo Jo pa^ooxoun *aoej:raqux s 
Aaouiaui aoeds eqca - K x^ uo T^ oaj TPTq ) YiYO ' \ qnaq.no J 
HT3 YiYa ' Una^no) H3.LIHM WI1 ' ^qnaqno; 9M H r HOOY 

•axoAo qs^x ©q^ uo Axuo saBucqo 3d -axoAo qs©x ©q^ 
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SER IN ( inputs, SER OUT f ou tput'! . SER CLK (input), 
ser LOADB out (output^ Lfa&BB IN (input) - Serial 

interface (SIF )_. These signals are brought "to outside 
pins SER_IN, SER_0UT, SERlCLK and SER_LOADB via pads as 
5 shown. The description that follows describes the pin 
level behaviour. 

The SIF provides a method for transferring data serially 
into and out-of a device, by means of a shift register 

10 in the device. The present processor uses a SIF shift 
register length of 36 bits. The data input to the shift 
register is SER_IN and the data output SERJDUT. SER_CLK 
is the clock. Data is clocked into the shift register 
on the negative edge, and out on the positive edge. 

15 SER_CLK is completely asynchronous to the processor clock 
and can be faster or slower than it if reguired. Data 
is clocked in and out MSB first. SER_LOADB is used to 
co-ordinate transfers via the shift register. 

20 The serial interface allows (1) the program space to be 
read, (2) the data space to be read and written and (3) 
the processor registers to be read and written. In 
normal running, SIF transfers are only carried out when 
a SIF instruction is executed. When the processor is 

25 asleep or stopped, SIF transfers are carried out 
immediately, starting the cycle engine for one cycle to 
perform the transfer. In normal running, the registers 
and limited data space areas cannot be written. In 
stopped state the registers and the full data space be 

30 written to. 

SER_LOADB is an open-drain output with an internal 
pullup. To perform a cycle, the shift register is loaded 
and SER_LOADB pulled low for > 1 clock. The process 
35 detects the transfer request and holds SER_LOADB low 



: ssajcppe 

uit^sj aq:* aoj aa^sTBeJ X sqi ssn sttpo auT^noaqns 

uox^oni^suT qouBiq aqi ca s^uTod Dd ^qouczq . BApBiaJ oe 
Dd b Bux^nosxa uaqM -aTq^TT^e aie aosds ssaappp ut 
aaaq^AuB oq. saqouBjq tiBqq. uiaas ni« «axA jo lUTpd aasn 
am uiojj -WOH e^^p ut paao^s ^asjjo aqa q*T« Jaxq^assB 
aqi Aq ^oaajP sb pa^uauiaxduiT AxTe^T^euio^np aq ubo 
aBuBa SAT^Biai 3d sqi apTS^no bjb qoTqw st*aB:iB:* qouBag 52 

•axqeiTBAe sip sTMa:m a6uB.x XT n J uiaas XTT W ^ajA 

jo ^utod aasn aqq. moid *WOH e^cp ut paioas en^BA aip 
q^TM aaxquiassB aq:* Aq ^oaaTP sb pa^uauiaiduiT AxxcoT**<"oq.tiB 
aq ubo TTS+ * * 2TS - ©Bubi aq* apTs^no bjb qojqw STcaatUT OZ 



S3J.0N 



•^cadaa oq. qno peaa sx 
b;bp aqi sb NfUSS o^ux pa^jxqs aq oq s,T, qsnC smoxx^ ST 
uox^xuxjap aqj, • ssaipps us dn Buyqqas ATTeoxixoads 
qnoqqTM speaJ x B T^u snb9S smoxx* sxqj, *Dd quauiaaouj pus 
ssaappc Dd ®q^ uioxj P B3J 'suoj^Baado pcaj aosds uiBjBoJd 

*0 sb OT 

pcai s^xq pauxjapun 'Axuo psai sx qxq daaas *«oxsq 
XTB^sp a^oui ut paqxiosap aq XTT" JIS ®q* ?° uoxqaaado 
aqj, • quauiafiuBaiB aaqsTBai q?TU.s dis aqq SMoqs 9 -67^ 

•aossaooad aqq Aq 5 
papeox aq oq aaqsx&aa qjxqs aqq woxx* °^ *°T sx HOYOrfHaS 
uaq«. mox sq qsnui ^D~H3S -aoBjaaqux aqq ?o apis aaqqxs 
uo s^utbj^suoo Buxurrq ou saoBxd pus ^snqoj st uiaqsAs 
33(BqspuBq sTqj, -paqaiduioo uaaq sBq ax^Ao aqq XT^un 
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fred 



BSR 
BRA 



fred 
0,X 



Branch to subroutine with BSR 
Subroutine 

Return with BRA 



If the subroutine itself calls a subroutine or otherwise 
uses X, it must be saved in memory and restored before 
exit. Conditional returns can be implemented by using 
BCC instead of BRA. Multiple exit points can be 
10 implemented by returning to 0,X or 1,X etc. 

Addition and subtraction operations work on signed or 
unsigned, integer or fractional values.. ADD, ADDC and 
NADD treat C as a carry. SUB, SUBC and CMP treat C as 

15 a borrow. ADDC and SUBC facilitate straightforward 
multi-precision arithmetic. The S flag applies to 
operations on signed values and gives the true sign of 
the result, independent of overflow into the MSB. It is 
calculated as 1TV and is especially useful for CMP where 

20 it indicates signed "less than" (N only indicates this 
over a limited operand range). 

MULTiply is signed. Integer and fractional multiplies 
are implemented as follows (»&" is the assembler macro 
25 parameter substitution operator): 



IMULT data 


; Multiply integers AL * data- Result 
in A 

LD AH, #0 
MULT &data 


FMULT data 


; Multiply fractionals AL * data. 

Result in A 

LD AH,#0 

MULT fcdata . 

SAL #1 
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oisz-uou st HY ?T U9A3 'smoxj^bao J3A9U Axdxqx™u aaBaqui 
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Note that A is a long integer or a double precision 
fractional in the above. . IDIV and FDIV will overflow if 
AH > data. No indication of overflow is provided. 

5 To implement a full signed divide, the operands must be 
made positive before the divide and the result corrected 
afterwards . 

Shifts can be from 0 to 15 bits. A 0 bit shift leaves 
10 A unchanged and sets C as if a 1 bit shift had been 
performed. This can be used to get the top or bottom bit 
of A into C . 

Some other instructions common on other processors are 
15 implemented as macros: 



CLC 


; Clear carry bit. Note affects S,N,Z 
ADD AL,#0 


SEC 


; Set carry bit. Note affects S,N,Z 
NADD AL,#-1 
XOR AL,#H'FFFF 


NEG reg 


; Negate reg. C is set as carry. NEG 
; instrn often sets as borrow 
NADD & reg f #0 


NEGA 


; Negate 32 bit A register. Note C set ; 
as carry 

XOR AL, #H' FFFF 
NADD AL,#0 
ADDC AH,#0 


RTS 


; Return from subroutine 
BRA 0,X 



Branch instructions for the full range of signed and 
25 unsigned comparisons are implemented as macros: 
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BHI branch_addr 


; Branch higher (unsigned) 

BCS &skip 

BNE &branch_addr 

&skip: 


BHS branch addr 


; Branch higher or same (unsigned) 
BCC &branch_addr 


BLO branch_addr 


; Branch lower (unsigned) 
BCS &branch_addr 


BLS branch_addr 


; Branch lower or same (unsigned) 
BCS &branch_addr 
BEQ &branch_addr 


BGT branch_addr 


; Branch less than (signed) 

BLT fcskip 

BNE &branch_addr 

&skip: 


BGE branch_addr 


; Branch greater or equal (signed) 

BLT &skip 

BRA &branch_addr 

& skip: 


BLE branch_addr 


; Branch less than or equal 
BLT &branch_addr 
BEQ &branch_ addr 



10 The primitives BLT, BEQ and BNE complete the set. 

It is best to avoid using manual PC relative branch (e.g. 
BRA $+1). These will not work correctly in the multiple 
instruction macros above and in general make assumptions 
15 about instruction lengths which can cause problems when 
branching over macros . 
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BRK, if it causes the processor to stop, leaves PC 
pointing at the BRK instruction. If it is over-ridden 
by the external signal RUN_STEP, BRK behaves as a NOP. 
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•qxnsaj x n J 9sn auios aqpoxpux qou Apui 
ao Apui anx^A aaqsx6aa aqj, *uipa6oad aqq ux uoxqxsod aqq 
aqpoxpux oq pasn aq upo enxBA aqpxpauiuix aqj, -Jaqsx6aj 
paxjxoads pup anx^A aqpxpauiuix s , uoxqonjqsux aqq qno quxad 
XXT^ 'paxq^ua jx .'pup Jioqpxnuixs 13 P"^ aa66nqap x a P OUI 
XaAax aqpB p Aq paqoaqap sx qnq pauaaouoo sx aossaooid 
aqq sp jpj sp dON t? sx qi *PT e 6ux66nqap p sx iMlHd 
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ST 
SUB 



ADD 
BRA 



X,8(-1,SP) 



; Store return 
address 



Y,# (LENGTH fred_s ; Allocate space 
+ 1) 



@ ( .varl ,SP) 



Y , # LENGTH fred_s 
+ 1) 

e(-i,sp) 



; Local variable 
address 



;• Return space 

; Return directly 
from stack 



10 Note the convention of storing the return address first 
on the stack. 

CONSTRUCTION AND OPERATION OF THE SERIAL INTERFACE (SIF) 
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20 



25 



30 



As described above with reference to Figs. 1 to 6 , the 
serial interface (SIF) operates to allow- external access 
to the memory spaces and registers of the processor via 
the external pins of the integrated circuit. Fig. 7 
shows one practical implementation. The processor in 
Fig. 7 may be the same as that of Figs. 1 to 6 , but could 
equally be of an entirely different design. 

The main part of the processor is illustrated at 500, and 
is shown schematically connected to the program memory 
104, and the data memory and I/O - 106, 108, 110. Within 
the main part of the processor core, an instruction latch 
502 receives program instruction words from the program 
memory 104, which are decoded by a control section 504 
of the processor. The registers of the processor are 
shown at 506. The arithmetic and logic unit (ALU) and 
other functional units of the processor core are grouped 
schematically in a block 508. A data bus of the 
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uoi^onj^suT pup xo^ uo ° ^ uaAXjp sx OAD — JIS ^tj 

Xojrquoo p ':iPxnDxq;ipd U I • Axe^^T^doadd© paxx oJ ^ uo ° S T 
9X9 aaxex^T^inui aqq uaq* {hm""H^IS) STS JaqsxBaJ IJjqs 
aqq jo qxq quPAaxaJ aqq. uiojj paAXjrap aq upo Aaouiaui pqpp 
aqq ;roj qxq ioj^uoo aqx^/ppaJ aqq 'Ax^TTWTS r (HQQY oe 
ais) £TS J^BTfiai aopjaaqux aqq uxq^x/* pguxpquoo sx 
qoxq/* qpqq aq ppaqsux upo Aaouiaui aqq. oq> paxxddp ssaappp 
aqq qpqq os 'sauxx x oJ ^ uo ° P uv ssajppp esaqq. ux papxAdd 
sx 9X5 aaxaxdj^T nuI p 'jaAa/AOH -aAoqp uoxqdxaosaa 
uxd aqq. ux paqxaosap sp axduipxa joj J 00S. .^oo aossaooad g^ 
aqq jo 80S 'frOS squauiaxa x^ 110 !^ 1111 ? X 13 ^ 011 ^ ( *HM 

H '/HaCHf) paxxddns AxtibuitjcI aJP OTT '801 '901 Aaouiam 
pqpp aqq oq paxxdde sxpuBxs aqxjm/ppaa pup ssaappp aqj, 

•paaxsap sp papxAjp-qns aaqqanj jo qz 
'qxun x^uoxqounj axBuxs p paquauiaxduix aq Apui jossaoojd 
aqq jo frxs frOS - sqxun xo^uoo ^^T1^*xd ux axTU* 

J ( HQYO f i~~Has ) bocjjs^ut aqq. jo auxx TOJ^uoa qqanoj aqq pup 
2XS aaqsxfiaj qjxqs aqq qqj** paqpxoossp imoqs sx frXS ^C^OT^ 
Xcxxquoo aopjjaqux uy • (NX - H3S) <3TM^ aqq oquy Jo/pup gx 
(.KIO - H3S) dxqo aqq jo quo £XS JaqsxBao: qjms aqq uioaj. 
paqjxqs aq ueo auixq p qp qxq auo ')TID~~HaS X^uBxs 3(ooxo 
aopjaaqux x^T^s ^qq oq asuodsaa ui *g m &T& ux uwoqs 
ajp qoxq/* px^TJ aopds ssaappp pup PX e T? eqxj^/pcaa, 'pxej? 
ssaappp 'pxaxj pqpp aqq saauqpaj pup £XS 1* paxpoquia oi 
Axx^^TsAqd st i^sifiaa l?m s ao^J^aqux x*T^®s aqj, 

• uoxqdxaosap 

uxd pue aanqoaqxqoap pqpp 'qas uox^oiurqsux aqq oq 
uoxqpx©.* U T ©Aoqp paxi^qap Aqxxpuoxqounj aqq quauiaxduix oq g 
axduipxa joj 'sanbxiiqoaq x 13110 !^ 11 ^ 1100 Buxsn paquauiaxduix 
aq Axippao: upo qoxq/* 'OTS °^ ZOS squauiaxe snoxapA 
aqq uaa/Aqaq sauxx x o:t ^ UOD P UG sqqpd pqpp aqq XT^^^P 
Aup ux Moqs qou saop L *6T^ *0IS 1? u^oqs sx aossaooad 

se 



€8rtO/S6aO/XDd 



, £8560/96 OAY 



WO 96/09583 



PCI7GB95/02283 



36 

*% ■ - 

multiplexer is activated in this way during the cycle of 
execution of the special SIF instruction, assuming that 
the bit SIF_A_SPACE indicates that the data address space 
is to be accessed in a given SIF operation. 

Similarly, a multiplexer 518 and tri-state buffer 520 are 
driven to cause the data bits of the shift register 512 
to be written from or written to the data bus 510 , during 
a SIF write or SIF read operation, respectively. The 
main part of the processor core 500 ignores the data 
present on the data bus 510 during the execution cycle 
of a SIF instruction. 

For the case where access to the registers or program 
15 memory space of the processor is desired . through the 
serial interface SIF, as indicated by the bit SIF_A i SPACE 
of the word loaded into the shift register 512, the 
multiplexer 518, a tri-state buffer 522 and a bi^ 
directional selection circuit 524 provide access between 
20 the data bits of the shift register 512 and the internal 
registers 502, 506 of the processor core 500. The 
selection circuit 524 is controlled by the lower 4 bits 
SIF_ADDR[3:0] of the address field within the shift 
register 512, as detailed in Fig. 6. Therefore, during 
25 a SIF instruction execution cycle, any of the registers 
PC, F (flags), AH, All , X or Y can be read or written, or 
the currently addressed location of the program memory 
104 can be read via the instruction latch 502. As 
described already, the program counter value stored in 
30 register PC can be incremented automatically to allow 
sequential access to a range of locations in the program 
memory 104, when the debug mode is activated. In 
practice, the selection circuit 524 and multiplexer 518 
may readily be combined with existing data path selection 
35 components of the processor core 500, to achieve a very 
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state. At 706, the command which has been loaded into 
the register 512 of the interface is processed during the 
next available SIF instruction cycle. When a write 
instruction is indicated by the bit SIF_R_WB in the 
5 register, this causes the DATA field to be written into 
the storage location of the ASIC determined by the fields 
SIF_A_SPACE and SIF_ADDR. In the case of a read 
operation, the DATA bits of the register 512 are loaded 
with data read from that location. Only after the SIF 

10 operation has been processed within the ASIC does the 
control circuit 514 release the fourth wire of the 
interface (step 708). This is to inform the external 
apparatus that the write operation is completed, or that 
the information to be read can be clocked out of the 

15 serial interface now. 

It is not guaranteed, however, that the external 
apparatus will already have released the fourth wire 
(step 606), and a loop is implemented at step 710 to 
20 monitor the state of the fourth wire. Only when this 
wire is seen to go high again does control return to step 
702. In. this, way, repeated SIF instructions are not 
implemented merely because the external apparatus is very 
slow to release the fourth wire. 

25 

Returning to Fig. 8, the right hand flow chart beginning 
at step 620 illustrates the SIF read operation. Steps 
622 to 628 are the same as corresponding steps 602 to 608 
of the SIF write operation, except that at step 622, no 

30 data needs to be clocked into the shift register 512 of 
the ASIC, saving time. Also, of course, the bit SIF_R 
WB is set to indicate read instead of write. After it 
has been detected at step 628 that both the external 
apparatus and the ASIC have released the fourth wire (SER 

35 LOADB = 1) the data read from the desired address or 
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ALU inputs A, B and CO, and also on the control lines 
from the instruction decoder 300. In addition to power 
saving in this way, these and further latches may be used 
to latch external inputs and to protect against "clock 
5 skew" at various points . 

Referring again to Fig. 3, it is also a power saving 
feature of the present processor design that most parts 
of the data architecture have dedicated data buses, 

10 rather than a shared data bus. The DATA bus which leaves 
the processor core and which is also connected to the SIF 
register is an exception, in that it is driven by logic 
with tri-state output buffers 345, 346 etc. However, as 
described above, the output E of the ALU 302 is connected 

15 by dedicated pathways to the registers 304 to 310, and 
the outputs of these registers are similarly connected 
to the inputs of the ALU by dedicated pathways and 
multiplexers 320 to 328. 

20 Compared with conventional processor designs in which all 
such elements are interconnected by means of shared data 
buses, the present processor design has eliminated many 
tri-state buffers that would otherwise be required at the 
outputs of the ALU and registers to drive a common data 

25 bus. Also, each dedicated data path has a lower 
capacitance than the conventional shared bus, with the 
end result that the power consumption of the processor 
core is lower. 

30 Fig. 10 shows schematically circuitry implemented for 
monitoring wake up signals during the SLEEP or STOPPED 
states of the processor. In these states, the main 
functional elements of the processor are not clocked by 
the usual high frequency clock signal (PCLK in this 

35 embodiment), to reduce power dissipation in those 
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processor was running. At the same time, there is no 
power dissipation in the flip flop 808. The XOR gate 804 
compares the actual input signal EXT1 with EXT IFF, and 
generates the individual wake up signal CEN=1 as soon as 
5 there is any change in the external signal EXT1 relative 
to EXT IFF . By. the operation of OR gate 802, CEN goes 
high, and the trigger circuit 812 sets CENFF high, 
enabling the internal clock PCLKI for the entire 
processor. At this point, the present state of signal 
10 EXT1 becomes latched also as EXT1FF, so that the CEN1 
signal itself disappears. 

A similar operation is provided in relation to input EXT 2 
by the XOR gate 806 and the flip flop 810. Any number 
15 of such inputs can be provided, with the same or 
different monitoring circuitry. 

Compared with a circuit in which, for example, each 
individual flip flop 808, 810 is clocked by the running 

20 clock PCLK to generate a synchronous clock enabled 
signal, the present arrangement achieves a reduced power 
dissipation in the sleeping or stopped state. Only the 
clock input of a single flip flop within the trigger 
circuit 812, in addition to the input of the gate 800, 

25 need to be continuously supplied with the running clock 
PCLK, no matter how many inputs are being monitored. To 
enter the sleep or stopped state, a clock disable signal 
CDIS from elsewhere in the control logic of the processor 
is applied to the trigger circuit 812, which sets CENFF 

30 low and so disables the internal clock PCLKI. 

APPLICATION EXAMPLE - DOMESTIC GAS METER 

WO-A1-95/04 258 describes an ultrasonic domestic gas meter 
35 apparatus, which measures gas velocity by a "time of 
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have a very low average current consumption, have a very 
good long term, reliability and very low unit cost when 
manufactured in high volume. 

5 Nowadays an ASIC (custom IC chip) solution might 
naturally be adopted to implement a specialised 
instrument of this type, and CMOS ASIC processes in 
particular are known to provide many advantages such as 
low cost and low power consumption. The use of this 
10 technology enables the development for example of custom 
analog cells to drive and receive signals from the 
ultrasonic transducers. 

However, the extreme low power consumption desired in the 
15 this example product can only be obtained when the 
program controlled processor which implements the signal 
' processing and control functions is also integrated, with 
its program store, on the same chip. Also, haying the 
processor on-chip would reduce interference emissions and 
20 reduce susceptibility to interference from outside. 
Aside from the obvious benefits of compliance with EMC 
regulations with less shielding, higher emissions within 
the product would impair the accuracy of the measurement 
electronics. 

25 

Unfortunately, as mentioned in the introductory part of 
this application, conventional processor designs tend to 
be too expensive in terms of chip area, and/or are not 
powerful enough per instruction for arithmetic-intensive 

30 applications. Large circuit size and high processor 
clock speed will only increase the power consumption. 
The problem of verification of the design also arises 
when all components aind the control program are fixed in 
the chip hardware, and when the program implements real- 

35 time operations. It will not normally be possible with 
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repetitive processing of data from the analog circuity 
1104. 

Reliable running at very low voltage, for example outside 
5 in very cold weather at the end of ' the life of the 
Lithium battery 1110, is more readily achieved with an 
integrated solution, since the entire ASIC design can be 
characterised for low voltages which would not be 
possible using various standard components. Average 
10 power consumption is reduced greatly by giving the ASIC 
1100 control over the power supplies to the components 
1112 to 1116. The processor core consumes very low power 
when it is in the sleeping state', and zero power when the 
clock signal PCLK is stopped. Simple timing circuitry 
15 on the ASIC can be provided to start the processor clock 
only intermittently, when measurements are required to 
be taken, and the processor 100 when running can take 
further control of the other circuity of the ASIC 1100 
and the printed circuit board 1105, activating only those 
20 circuits which are required at a given time. The power 
consumption of the processor is very low even when it is 
running, due to various features mentioned above. 
Furthermore, because the arithmetic instructions are 
powerful for the size of the processor, the processor 
25 does not need to execute so many instructions for a given 
calculation. 

To maintain economies of scale while providing a domestic 
gas meter that can meet different national standards and 

30 allow several product variants, the ASIC 1100 (including 
its stored program) should be the same for all such 
variants. This would not be achieved readily in 
conventional microprocessor architecture, but in the 
present example the difference between the products is 

35 implemented in the low cost external microprocessor 1114 
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FURTHER NOTES 

Those skilled in the art will recognise that the detailed 
implementation of a microprocessor or other circuit 
5 embodying any aspect of this invention need not be 
limited to the examples given above. Of course the 
details of the instruction set can be changed to suit a 
given application, the widths of address and data busses, 
and the widths of various fields in the instruction word 

10 of the processor can be changed also. Even at a more 
general level ] the scope of the present invention 
encompasses many individual functional features and many 
sub-combinations of those functional features, in 
addition to the complete combination of features provided 

15 in the specific embodiment. Whether "a given functional 
feature or sub-combination is applicable in a processor 
having a different architecture, for example a processor 
with pipelined instruction decoding and execution, will 
be readily determined by the person skilled in the art, 

20 who will also be able to determine the adaptations or 
constraints imposed by the changed architecture. 

It will also be appreciated, that, whereas the program 
instructions and initial data for the processor operation 

25 are permanently fixed in ROM storage on-chip, embodiments 
are perfectly feasible for prototyping and/or final 
production in which the ROM is replaced by E 2 PR0M 
(electrically erasable programmable read only memory) or 
one-time-programmable ROM, where the processes used for 

30 manufacture (and the costs) will permit. 

All or part of the program store may in some cases need 
to be off -chip. If the pin count associated with the 
architecture is too high, it may be reduced for example 
35 by providing an 8-bit program ROM, and performing 
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of National Semiconductor Corp, with particular 
application to E 2 PR0M components. 

The SIF described above has the advantage of a very 
5 simple and robust structure and protocol, with a single 
master, and can be used in many of the same applications 
as the known interfaces mentioned above. 

Although the Microwire interface defines a simple fixed 
10 master and slave relationship, the known interfaces like 
I*C and Microwire impose fixed word lengths and strict 
timing constraints on the slave device . The SIF as 
described herein allows variable word lengths, and each 
of master and slave can take as long as it needs to 
15 respond to the interface. This avoids the need to 
interrupt the flow of control in the ASIC at short 
notice, in response to unpredictable external stimuli, 
thereby simplifying the design and verification of new 
ASIC designs. In . particular , the programmer of the 
20 present ASIC can keep track of real time supply by 
calculation from the clock speed and known instruction 
execution times. In a conventional processor design, 
where interrupts may occur in response to external 
stimuli, the current location in the program is no guide 
25 to elapsed real time, and other timer mechanisms, 
typically implemented by further interrupts, are reguired 
to implement real-time dependent operations. Also, since 
there is no speed constraint, the SIF can be used to read 
values from any of the memory-mapped I/O devices, which 
30 might require a lengthy wait for response from some off- 
chip peripheral (keyboard or sensor). 

Fig. 12 shows how it is possible to enable the SIF 
architecture to allow a microcomputer to address plural 
35 slave ASICs, for example by providing a separate "chip 
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When the chip select is low, the slave will also ignore 
SIF_LOADB when it is pulled low by the master, and in 
particular will not latch SIF_LOADB low and will not 
queue up a SIF access operation for the next SIF 
5 instruction cycle. 

When SIF_CS is high for a given slave, however, than the 
SIF_MISO output is enabled, and the SIF_LOADB output line 
input/output is enabled in exactly the manner described 
10 above with respect to Fig. 7. Extension of the interface 
handshake mechanism to provide for plural masters is 
equally feasible, but will not be described herein. 

Another possible modification of the SIF described above 
15 concerns the SIF read operation. Since no address value 
needs to be present in the shift register bits 
SIF_ADDR[15;0) when the value is readout of the ASIC, it 
would be possible for example for every SIF read 
operation to provide access not only to the particular 
20 memory location requested, but also to supply a fixed set 
of status values such as the flags register or program 
counter PC. These values could be available at little 
extra cost, being loaded into the address field of the 
interface register at the same time as the data field is 
25 loaded, and need not be clocked out by the external 
device if they are not of interest. 

Another feature of the SIF type of interface is that the 
separate data input and output wires SER_IN and SER_0UT 

30 can be used simultaneously to read data from the ASIC and 
to load another word into the SIF shift register on the 
ASIC, to set up the next read or write operation. This 
potential for parallel operation of SER_IN and SER_0UT 
at each cycle of the clock •SER_CLK is illustrated by the 

35 double broken lines in between the flowchart step 630 and 
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may be replaced by a family of instruction codes, 
allowing different types of serial interface access. For 
example, a programmer may be happy to allow SIF read 
operations more often during the running of the program 
5 than SIF write operations. In such a case, separate 
instructions S I F_R EAD_0NL Y (allowing the interface only 
to read from the memory space) and SIF_ ALL (allowing both 
read and write operations) might be defined, for example. 
Then, if a SIF_WRITE instruction is received via the 

10 interface, this will only be processed during the next 
SIF_ALL cycle, irrespective of how many SIF_READ_ONLY 
cycles may have been executed in the meantime. 
Similarly, different instructions might be provided to 
allow access to different parts of the process of address 

15 space at different times. Of course, for each 
instruction of this type, it is still the case that the 
programmer determines only the timing of the 'memory 
access, while the specific memory access operation 
desired is defined by the external apparatus. 

20 

The de-bug control circuitry, featuring the control lines 
STOPB and RUN_STEP is similarly applicable in a wide 
range of processor architectures and applications. While 
this control mechanism is particularly useful in 

25 combination with the serial interface functions 
described, these features are also of use independently. 7 
The provision of a breakpoint instruction (BRK above) 
which is conditional on the existence of the de-bug mode 
is also advantageous in itself, particularly where the 

30 microprocessor control program is stored in ROM memory 
on chip. As described above, the breakpoint instruction 
BRK can be present in all prototype and final versions 
of the stored program, but will be effectively ignored 
by the processor during normal operation. 

35 



oz 



•ujaaaq AxiT^TTduiT 10 AxqTOTldxa 
pasoxosxp saan^ps; jo uoTquuTquioo x 9A ou jo 9jn^B9j x aAOU 
Aue uix^X 3 °^ m^T^ a m saAJasaa queoTxddp aqq 'surtax 3 
penop^c aqq ut pup aAoqs pauTjap aie uot^usaut aqq qj 
jo s^oadsp oxjTDsds snoiJPA qbnoqqxv *uot^U9aut q.uasaad 
aqq. jo adoos aqq. ujqqT** bjb pup Jap^ai paxXTX s 
oq. snoiAqo aq XT™ suot^pstxm9U96 aaq^o pu© 9Aoqe aqj; 

'daans Jaqj© uot^ouj^sut aqq st q.uaurrpoquia sjqq OT 
ut qoiijM 'ure:r6o;id aqq. ut quTod umou3( p dn 93(ca sApmx© 
XXT** J08S90oid aqq 'AxJ^XT^TS 'U&Tsap axq^TJT J9A aaoui p 
smoxx^ qoTM^ ' uojquoaxa uicaBoad aqq ut qujod UTB^aao e qp 
st joss9Doid aqq XT^ un P©Ai?x 9 P ec T sAe^x^ TIP* IT ' eu TT 
X^u6ts dn~3XYM Butsh 'jiossaooad aqq. apjsq.no uiojj g 

papupuiuioo aq oq. st ews daa'IS aqq uaqM U9Aa * pajaqua 
9c l XIT M 9qw daaiS ®m M^TM** uoTq.noaxa ureaBoad 

ut quTod aqq jo x oJ ^ uo ° jaunueaBoad aqq saAj6 osx^ 'qas 
uoTq^onaqsuT aqq ut paujjap . Bujaq 'uojqpaado daaTLS ®MiL 

55 



€8trO/S6aO/JLDd 



€8560/96 O/W 



WO 96/09583 PCT/GB95/02283 



56 
CLAIMS 

1. A data processing apparatus including a processor 
constructed to operate under the control of a stored 
program comprising instructions selected from a 

5 predetermined instruction set, wherein an interface is 
provided to allow an external apparatus to signal a 
request for communication with the processor, and wherein 
means are provided to cause the processor to communicate 
in accordance with such requests only during pre- 
10 determined periods of time in the execution of the stored 
program. 

2. An apparatus according to claim 1 wherein each 
communication request is interpreted by the processor in 

15 accordance with a specific communication instruction 
loaded into an interface register by the external 
apparatus . 

3. An apparatus according to claim 1 or 2, wherein said 
20 predetermined periods are defined by the inclusion in the 

stored program of a generic communication instruction 
(SIF) of the instruction set, the generic communication 
instruction having a fixed execution time. 

25 4. An apparatus according to any of claims 1 to ,3, 
wherein the processor and stored program are provided oi> .. 
a single integrated circuit. 

5. An apparatus according to any of claims 1 to 4, 
30 wherein said interface comprises a serial interface. 

6. An apparatus according to claim 5, wherein said 
serial interface includes separate data lines for input 
and output of data to the processor and wherein data can 

35 be shifted serially into and out of the processor in 
parallel via said data lines. 
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period, before the processor releases the control line, 
whereafter it can be read from the interface register by 
the external apparatus. 

5 11. An apparatus according to any of claims 2 to 10, 
wherein the specific communication instruction is 
interpreted by hardware of the processor in accordance 
with pre-determined rules . 

10 12. An apparatus according to any of claims 2 to 11, 
further comprising an addressable storage space, wherein 
at least a first type of specific communication 
instruction includes an address field interpreted by the 
processor as indicating a location in said storage space, 

15 and wherein the processor is responsive to the specific 
communication instruction during said predetermined time 
period to allow reading, writing or a selection of the 
two to be performed between the interface and the 
specified storage location* 

20 

13. An apparatus according to claim 12 wherein said 
storage space comprises the data space, register space, 
and/or program memory of the processor. 

25 14. An apparatus according to claim 13, wherein a memory 
access interface of the processor includes means 
(LIM_WRITEB) for distinguishing between memory accesses 
performed under normal program control and those 
performed in response to the communication request. 

30 

15. An apparatus according to any of claims 12 to 14, 
wherein a read operation may be performed, wherein an 
address may be loaded into the address field portion of 
the interface register to initiate one read operation 
35 without loading the entire interface register. 
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operation of said data processing circuitry, wherein the 
data processing circuitry includes a plurality of 
addressable storage locations , and is responsive to each 
communication word received in said interface register 
5 to perform a memory access operation with an address 
specified in a predetermined address portion of said 
interface register. 

21. An apparatus according to claim 20, wherein to 
10 perform a memory read operation said processing circuitry 
copies the content of the addressed storage location into 
a predetermined data portion of said interface register, 
different to said address portion, for asynchronous 
readout by said external circuitry. 



15 



20 



25 



22. An apparatus according to J claim 21, wherein an 
address may be loaded into the address field portion of 
the interface register to initiate one read operation 
without loading the entire interface register. 

23. An apparatus according to claim 21 or 22, wherein 
data may be read from the data portion of the interface 
register to complete one read operation without reading 
out the entire interface register. 



24. An apparatus according to claim 21, 22 or 23, 
wherein data may be read from the data portion of the 
interface register to complete one read operation while 
an address is being loaded into the address field portion 

30 of the interface register to initiate another read 
operation . 

25. An apparatus according to any of claims 20 to 24, 
wherein a reading or writing operation is selected within 

35 the processor under control of a selection field in the 
specific communication instruction loaded into the 
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32. An apparatus according to claim 31, wherein said 
time periodsare determined by the inclusion of a generic 
communication instruction in the stored program.. 

5 33. A local communication system comprising at least one 
master apparatus and plural slave apparatuses connected 
by at least one common serial data writing line, each 
slave apparatus having an interface shift register for 
receiving a data word from the master apparatus via the 
10 common serial data line, wherein the interface shift 
registers of the plural slave apparatuses do not have 
equal bit lengths, and wherein the master apparatus is 
adapted to send correspondingly different length data 
words to the different slave apparatuses. 

15 

34. A system according to claim 31, the system further 
comprising a clock line, and at least one control line 
which can be forced to an active state by either of a 
selected slave apparatus and the master apparatus, such 
20 that, in an operation for the transfer of a data word 
from the master apparatus into the selected slave 
apparatus: 

said data word will be loaded into the interface 
shift register of the slave apparatus via the serial data 
25 writing line in accordance with a bit clock signal 
applied by the master apparatus to the clock line, while 
the control line remains in a passive state, 

the active state will then be imposed on the control 
line by the master apparatus, 
30 the selected slave apparatus will detect the active 

level on the control line and itself maintain the active 
level while reading the data word internally from the 
interface register, and 

the slave apparatus will release the control line 
35 when it is ready to receive a next data word. 
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apparatus into the slave apparatus; 

said data word can be loaded into . an interface 
register of the slave apparatus via the serial data line 
in accordance with a bit clock signal -applied -by the 
5 master apparatus to the clock line, while the control 
line remains in a passive state, 

the active state can then be imposed on the control 
line by the master apparatus, 

the slave apparatus will detect the active level on 
10 the control line and itself maintain the active level 
while reading the data word internally from the interface 
register, and 

the slave apparatus will release the control line 
when it is ready to receive a next data word. 

15 

41. An apparatus as claimed in claim 40, wherein a 
further data word will not be read internally from the 
interface register and processed until the control line' 
has been released by both apparatuses for a predetermined 

20 time and then again forced to the active level by the 
master apparatus. 

42. An apparatus according to claim 40 or 41, wherein 
at least for a data word which is recognised by the slave 

25 apparatus as a command requiring information in response, 
said information is loaded into the interface register 
of the slave apparatus before the slave apparatus 
releases the control line, whereafter it can be read from 
the interface register via a serial data reading line. 

30 ' 

43. An apparatus according to claim 42, wherein said 
command is interpreted by hardware of the slave apparatus 
in accordance with pre-determined rules. 

35 44. An apparatus according to claim 42 or 43 , wherein 
said data writing line and said data reading line are 
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wherein a read operation' may be ' performed , wherein said 
data word, as output by the slave apparatus, includes a 
field for data read from said storage location, and 
wherein said data may be clocked out of the data field 
5 portion of the interface register to complete one read 
operation without clocking out the entire data word . 

50. An apparatus according to claims 48 and 49 combined* 

10 51. An apparatus according to any of claims 45 to 50, 
wherein a reading or writing operation is selected within 
the slave apparatus under control of a selection field 
in the data word clocked into the interface register by 
the master apparatus . 

15 

52. An apparatus having an interface comprising at least 
one serial data writing line, a clock line and at least 
one control line, the apparatus having the technical 
features for operation as the master apparatus in 

20 connection with a slave apparatus according to any of 
claims 40 to 51. 

53. A program controlled data processor comprising: 
an instruction decoding circuit (300) for 
implementing control within the processor in 
accordance with a stored program; 

- an arithmetic unit (302) having input and output 
data paths of width n bits; 

a register (A) of width greater than n. bits 
connectable under control of the instruction 
decoding circuit to at least one of the input and 
output paths of the arithmetic unit; and 
a shifting circuit (312) separate from the 
arithmetic unit for performing shift operations of 
said greater width using said register. 
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54. A processor according to claim 53 , wherein said 
shifting circuit is physically connected between the 
output path of the arithmetic unit and data inputs of 
said register , with feedback also from data outputs of 

5 said register. 

55. A processor according to claim 53 or 54, wherein 
said register of greater width comprises a pair of 
registers (AH,ALi) each of width n, independently 

10 connectable to an input or output path of said arithmetic 
unit. 

56. A processor according to any of claims 53 to 55, 
wherein said instruction decoding circuit is responsive 

15 to a predetermined multiplication instruction to control 
the arithmetic unit, the shifting circuit and the 
register so as to multiply two values of width n bits, 
the result being obtained in the register. 

20 57 . A processor according to claim 56, wherein the 
multiplication operation is performed over plural 
operating cycles of the arithmetic unit and the shifting 
circuit. 

25 58. A processor according to any of claims 53 to 57, 
wherein said instruction decoding circuit is responsive 
to a predetermined division instruction to control the 
arithmetic unit, the shifting circuit and the register 
so as to divide two values of width ri bits, the quotient 

30 and remainder being obtained in the register. 

59. A processor according to claim 58, wherein the 
division operation is performed over plural operating 
cycles of the arithmetic unit and the shifting circuit. 

35 

60. A processor according to any of claims 53 to 59, 
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communication request from external circuitry, the 
processor, being responsive to the such a communication 
request only during predetermined time periods during 
normal program controlled operation. 

5 - , 

67. A processor according to claim 66, wherein said 

predetermined periods are defined by inclusion of a 
predetermined communication instruction in the stored 
program. _ . 

10 

68. A processor according to any of claims 53 to 67, 
wherein said stored program comprises instructions (SIF, 
BRK, SLEEP) selected from a predetermined instruction 
set, wherein the processor has at least one external 

15 control line, and wherein at least one instruction of the 
instruction set is decoded differently depending on a 
signal present on the external control line. 

69. A processor according to any of claims .53 to 68, 
20 wherein the processor has a basic instruction cycle sub- 
divided into plural internal clock states, and wherein, 
for at least one combinational logic circuit having 
plural input lines and functioning under control of each 
stored program, means are provided to sample and latch 

25 input values for the combinational logic circuit only at 
a defined state or states within the operational cycle. 

70. A processor according to claim 69, wherein said 
combinational logic circuit comprises the arithmetic unit 

30 of the processor. 

71. A processor according to claim 70, wherein said 
combinational logic circuit further comprises the 
instruction decoding circuit of the processor. 

35 

72. A program controlled data processor comprising: 
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76. A data processing apparatus comprising: 

a data processor . circuit adapted to operate 
synchronously, under control of a high frequency 
clock signal; . - 
5 - means for imposing a state of low power consumption 
in which execution of said program is suspended and 
said clock signal is isolated from the processor 
circuit , while the clock signal continues running; 
and 

10 - monitoring means responsive to any of a plurality 
of external signals for ending said suspended state 
by re-applying said clock signal to the processor, 
wher.ein said monitoring means comprises: 

plural individual monitoring, circuits for detecting 

15 predetermined changes in respective ones of the 

external signals; and 

a common trigger circuit responsive to outputs of 
the individual monitoring circuits for re-applying 
said clock signal to the processor, 
20 and wherein the individual monitoring circuits, but not 
the common trigger circuit, are isolated from the running 
clock signal during said suspended state, 

77. An apparatus according to claim 76, wherein at least 
25 on of said individual monitoring circuits comprises: 

- means responsive to said clock signal prior to the 
suspended state for storing a value of the 
corresponding external signal; and 

asynchronous circuit means for, during the 
30 suspended state, comparing the external signal with 

the stored value. 

78. A data processing apparatus constructed to operate 
sequentially under the control of a stored ^program 

35 comprising instructions selected from a predetermined 
instruction set, wherein the apparatus . has at least one 



Buxssajppp quppunpaj p oq spuodsauoo suoponi^suT bnqap 
oxjxoads aqq oq. uoumioo }jpd apoo-do up uxajaq/* ' £Q oq is 
suixpxo jo Aup oq 6uipjoooe jot^pxnuia jo joqpxnuixs Y • fr8 

•papjooaj aq oq osx© a^e squaquoo ^uajano asoqM 
losseoojd paqcxnuia jo pa.qpxnurrs J° uoxqppox Jaq.sx6aj 
p saxjxoads jaqqjnj uoponjqsuj 6nqap qons qopa uiaiaqw 
'28 uix^xo oq Buxpjooop joq.PX*iuia *o ao^B^nuixs y *£8 

•papjooaj aq oq anxBA 
pqpp p saxjxoads uoxqotuq.sux Bnqap qons qopa uxajaqM 
'X8 uixpxo °l Buxpjooop joqexnuia jo jo^pxhuixs y *Z9 

•uoT^onj^sui aAX-qoadsaj aqq. 
jo uoxqnoaxa 6ujxx bu ^T s uoxq.puiJOjux pjooaj oq. Joq.Pxnuia 
jo Joqpxnuixs ©M^ asnpo qojqtt qnq jxasq.x Jossaoojd 
aqq ux uoxqpjado ou q.uauiaxdmx 'jossaooid aqq. joj 
uipjfiojd p ux papnxoux uaq*v 'noxqw (j,NlHd) suox-qonjqsux 
Bnqap oxjxoads jo AqxxPJnxd p Buxpnxoux ^as uoiqomqsuT 
aqq 'qas uoxqonjq.sux pauxune:*apajd p BuxApq jossaoojd 
paxx°-*3>uoo uipjfiojd p joj joqpx™ 8 - 10 Joqpxnuixs y *X8 

• Ajouiaui axT^ B T° A - uou 
ux uipjfiojd pajoqs pxps qq.XM jaqqaBoq qinoijo paq.pj6aq.ux 
axBuxs p uo paqpjBaqux sx snqpjpddp PT BS uxajaqM 
' &L oi 8Z. suixpxo jo Aup oq, Buxpjooop snqpjpddp uy *08 

•paqXPq uaaq 

spq uoxqnoaxa qpqq Axx^^ja^xa 6uxxx^u6xs ( Hdb.LS~NnH ) 

supaui Buxsxjduioo '8Z. uixpxo oq. Buxpjooop snqpjpddp uy m 6L 

*X^u6xs pxps uo 6uxpuadap 'uoxq.pjado g 
XnjBuxupaui ou qoajja XTT** JO suojqpiado Buxssaoojd 
paxxo^^uoo-uipjfiojd aqq qx^ jaqq.xa XT™ qoxqM uoxq.Dnjq.sux 
up sx qas uoxqonjqsux aqq. jo (hhh) uoxq.onjq.sux auo 
qspax qp uxajaq/A pup '(d3iLS~NnH) 9"TT X o:tt * uo::) T^ujaq.xa 

ZL 



£SZZ0tS6ED/±Dd 



£9960/96 OAV 



WO 96/09583 



PCT/GB95/0228J 



73 . 

mode of another instruction in the instruction set, 

85 • A simulator or emulator according -to any of claims 
81 to 84, wherein the op-code part common to the specific 
5 debug ^instructions corresponds to an immediate addressing 
mode of a memory store • instruction in ,the instruction 
set . 

86. A processor or data processing apparatus having the 
10 features of any combination of the preceeding claims. 
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